-
Notifications
You must be signed in to change notification settings - Fork 385
Add unit test for HQQ #3403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add unit test for HQQ #3403
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3403
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New FailureAs of commit f7323e3 with merge base 5d519d9 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
❌ 🤖 pytorchbot command failed: |
|
@pytorchbot label "topic: performance" |
torchao/_models/_eval.py
Outdated
| @property | ||
| def max_gen_toks(self): | ||
| return 50 | ||
| return 512 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is updated because the GSM8K task is CoT reasoning (long-context)
|
cc @metascroy can you take a look into this? I will move to E2E support for SINQ after this PR as we talked with #3156 (comment); we can add SINQ to this e2e benchmark I feel. |
|
Thanks! The unit test looks good to me. I'll let @jainapurva review the benchmark script, as I'm less familiar with that part of the codebase and if there are coding standards that are usually followed. |
|
intx is targeting mobile right? so testing performance on server doesn't seem to be helpful? |
@jerryzh168 Sorry I misunderstood the difference between intx (ExecuTorch/mobile) and int4 (CUDA), and actually my use case is CUDA (not mobile) deployment. Working more for int4 should be better to me, so I will move to SINQ support for int4 (as talked with #3106 (comment)). |
@metascroy Updated PR for only adding the HQQ unit test (drop intx benchmark). Please check the above comment. Let me know if a dropped benchmark would be needed for ExecuTorch; if needed, I will submit a PR for ExecuTorch. |
The benchmark isn't necessary for ExecuTorch. |
|
PR looks good! If unit tests pass, we can merge. |
| iters=20, | ||
| ) | ||
|
|
||
| # Check quantized data shape and dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test doesn't look very hard to pass? would it be better to make it stronger by comparing with the results of
| "_choose_qparams_and_quantize_affine_hqq", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding intx+hqq to
ao/benchmarks/quantization/create_quantized_model.py
Lines 30 to 35 in 8b2367b
| elif s == "int4_groupwise_hqq_weight_only": | |
| return Int4WeightOnlyConfig( | |
| group_size=32, | |
| int4_packing_format="tile_packed_to_4d", | |
| int4_choose_qparams_algorithm="hqq", | |
| ) |
instead of comparing them here? Both are different HQQ implementation (optmized for NVIDIA vs. no HW optimization), so a sanity check might not work well here. If we need to compare them, we can do some experiments with the new eval script I feel.
| self.assertFalse(torch.isnan(reconstructed).any()) | ||
| self.assertEqual(reconstructed.shape, input.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also here can we compare reconstructed with original input (with sqnr) to make sure they are close?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to compare reconstruction error, thanks.
Summary:
The HQQ unit test is added because there was only an e2e test.
Test Plan:
Related Issue/PR: #3156 (comment)